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We have previously cloned and characterized the 
murine bomologue of the Huntington disease (HO) 
gene and shown that it maps to mouse chromosome 
5 within a region of conserved synteny with human 
chromosome 4pl6.3. Here we present a detailed com- 
parison of the sequence of the putative promoter and 
the organization of the 5' genomic region of the murine 
(Hdh) and human HO genes encompassing the first five 
exons. We show that in this region these two genes 
share identical exon boundaries, hut have different- 
size introns. Two dinucleotide (CT) and one trinucleo- 
tide intronic polymorphism in Hdh and an intronic CA 
polymorphism in the HO gene were identified. Com- 
parison of 940-bp sequence 5' to the putative transla- 
tion start site reveals a highly conserved region (78*8% 
nucleotide identity) between Hdh and the HD gene 
from nucleotide -56 to -206 (at Hdh). Neither Ildh nor 
the HO gene have typical TATA or CCAAT elements, 
but both show one putative AP2 binding site and nu- 
merous potential Spl binding sites. The high sequence 
identity between Hdh and the HD gene for approxi- 
mately 200 bp 5' to the putative translation start site 
indicates that these sequences may play a role in 
regulating expression of the Huntington disease gene. 

© 1995 Academic* Press, Inc. 



INTRODUCTION 

Huntington disease (HD) is an autosomal dominant 
neurodegenerative disorder characterized by involun- 
tary movements, psychological disturbance, and cogni- 
tive decline that usually manifests in mid-life (Hayden, 
1981; Harper, 1991). Recently, a novel gene containing 
a CAG trinucleotide repeat that is expanded on HD 
chromosomes was identified (HDCRG, 1993). This gene 
encodes two messenger RNAs that are widely ex- 
pressed in different tissues but with varying abun- 

1 To whom correspondence should be addressed at the Department 
of Medical Genetics, University of British Columbia 416-2125 East 
Mall, NCE Building Vancouver, B.C. Canada V6T 1Z4, Telephone: 
(604) 822-9240. Fax: (604) 822-9238. 



dance (Lin et aL, 1993; L.i et aL, 1993; Strong et aL, 
1993; Ambrose et aL, 1994). We and others have cloned 
the mouse homologue (Hdh) of the human HD gene 
(Lin et aL, 1994; Barnes et aL, 1994) and mapped it to 
chromosome 5 within a region of conserved synteny 
with human 4pl6.3 (Nasir et aL, 1994). However, little 
is known regarding either the function of the HD gene 
product or the regulation of expression of the HD gene. 

As a first step to further our understanding of the 
organization and regulated expression of the HD gene, 
we have cloned the genomic regions containing the first 
five exons , including exon 1, which contains the CAG 
repeat from mouse, and determined their genomic or- 
ganization. We have conducted a detailed structural 
comparison of 5' upstream sequences, including the 
putative promoter region between the human and the 
mouse HD genes. 

Both the CAG and the CCG repeats immediately fol- 
lowing the CAG repeat in the HD gene have been 
shown to be polymorphic in the general population (An- 
drew et aL, 1994; Kremer et aL, 1994; Rubinzstein et 
aL } 1994). The CCG repeat in the HD gene varies from 
6 to 12, and the CAG repeat length varies from 9 to 35 
on normal chromosomes (Kremer et aL, 1994). We have 
previously shown that the CCG repeat between nucleo- 
tide positions 211 and 223 of Hdh is polymorphic be- 
tween two different mouse strains (Lin et aL, 1994). 
There are three CCG repeats in the ICR outbred strain 
(Lin et aL, 1994) and in wild mouse Mus spretus 
(Barnes et aL, 1994), but four CCG repeats in 129J, 
C57BL/6J (Lin et aL, 1994), PCC4, and CBA strains 
(Barnes et aL, 1994). In contrast to the murine CCG 
repeat, the adjacent (CAG) 2 CAA(CAG) 4 is not polymor- 
phic in five strains of mice (129 J, PCC4, CBA, C57BIV 
6J, and ICR outbred) (Lin et aL, 1994; Barnes et aL, 
1994). 

Comparison of about 940-bp sequences 5' of the puta- 
tive translation start site has, however, identified a 
highly conserved region between the Hdh and the HD 
genes from -56 to -206 with a nucleotide identity of 
78.81%, suggesting that these sequences in the 5' 
flanking regions of both the Hdh and the HD genes 
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FIG. 1* Schematic map showing the comparative genomic organization of the 5' region of Hdh and the HD gene. Black boxes represent 
exons. M42d, M51, and H5 are phages isolated from a mouse genomic DNA library, while L191F1, L83D3, and L228B6 (HDCRG, 1993) 
are cosmids from the chromosome 4-specmc cosmid genomic library (courtesy of Lawrence livermore National Laboratory). Microsatelhte 
, repeats are indicated by arrows. 



may be important in the regulated expression of these 
genes. 

MATERIALS AND METHODS 

Isolation of genomic clones for the 5' region of the Hdh and HD 
gene. A total phage genomic library of mouse strain 129J was plated 
at high density (200,000 pfu/24 X 24 cm 2 bioassay dishes) onto NZY 
media. Two sets of replica niters were made from each plate using 
Hybond-N + nylon niters (Amersham). The niters were immersed in 
denaturing solution (1.5 M NaCl, 0.5 M NaOH) for 30 s, in neutraliza- 
tion solution [1.5 M NaCl, 0.5 M Tris-HCl (pH 8.0)] for 30 s, in 2x 
SSC for 30 s and baked at 80°G for 2 h. 

After secondary and tertiary screening, the positive plaques were 
picked. DNA from these positive phage were extracted, digested with 
different enzymes* and transferred onto Hybond-N + nylon filters 
(Amersham). Genomic fragments containing exons were identified 
by hybridization with Hdh cDNA or primers designed from the se- 
quences of the cDNA (Lin et at., 1994). 

Human cosmids L191F1, L83D3, and L228B6 (HDCRG 1993) en- 
compassing the 5' genomic region of HD gene were picked from a 
gridded human chromosome 4-specific library (cell source: UV20 
HL21-27, hamster- human hybrid cell lines containing human chro- 
mosomes 4, 8, and 21), and genomic cosmid blots were made after 
DNA from these cosmids was digested with EcoBI, Hindlll, and Pstl. 



The filters were hybridized with primers designed from the human 
HD cDNA to map the exons. 

Prehybridization and hybridization. Prehybridization and hy- 
bridization were performed in Church buffer (0.5 M sodium phos- 
phate buffer, pH 7.2, 7% SDS, and 1 taM EDTA) at 65*C (Church 
and Gilbert, 1984). After hybridization, filters were washed gradually 
to a final stringency of IX SSPE (0.18 M NaCl, 0.01 M NaH 2 P0 4 , 1 
mAf EDTA, pH 7.7) and 0.1<& SDS at 65°C for 20 min. Autoradiogra- 
phy was carried out for 12-24 h at -70°C. Positive clones were 
purified following secondary and tertiary screening. 

DNA sequencing and analysis. Plasmid DNA was prepared using 
a plasmid DNA preparation column (Qiagen). Automated sequencing 
was performed using the ABI373 A sequencer. Manual dideoxy se- 
quencing was performed using the Sequenase kit (US Bio chemicals). 
Sequencing and PCR primers were synthesized with a PCR Mate 
391 DNA synthesizer (Applied Biosystems, Inc.). 

Sequence data were entered into a Sun IPX Workstation and ana- 
lyzed with the Staden sequence analysis package (Dear and Staden, 
1991). The commercial sequence analysis program MacVector (Inter- 
national Biologies, Inc.) and GeneWorks (IntelliGenetics Inc.) were 
also used for sequence analyses and for identification of potential 
ctVregulatory elements within the mouse and human HD putative 
promoters. 

Assessment of polymorphisms. Further sequence analysis of the 
mouse gene has revealed three additional microsatellite repeats in 



Material may be protected bv copyright law (Title 17, U.S. Code) 



Hdh AND THE HD GENE/5' COMPABISON 709 



ccagtaccaggacctacaca^agAAACCTTTAACT^ 



g^CTCTcn'CTgTCTCTTTTTTACyTAGA^tTCTCCAGAATTticagaaactGtitgggcatcgcta 



gagca gca agtgc t ct t a acccCTGAGCTGT AACTCCX AGC AACCAAGCAACC AACCAACAACTT ACTTCTC 
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FIG. 2. (A) Sequence of the (AAG/AAGG) 6e repeats. The PCR primer sequence for amplifying the repeats are in lowercase and indicated 
by arrows, while the repeat sequences themselves are underlined. (B) Sequence of the (CT) 20 repeats. (C) Sequence of the (GT) 27 repeats. 
(D) Polymorphism of the AAG/AAGG repeat in different mouse strains. The repeat sequences in the box are polymorphic. The numbers 
under the boxed repeat sequence indicate the number of the corresponding repeat. The polymorphism is polar and seen in the 5' portion 
of the AAG/AAGG repeat but absent in the 3' portion of the repeat in five different strains of mice (129 J, C57BIV6J, B10.S, Scid, and 
Balb/c). 

the introns of Hdhi the (AAG/AAGG) 56 repeat and the (CT) 20 repeat each primer, 0,12 mM dNTPs, 1.5 mAf MgCl 2 , and 1* PCR buffer 
in intron 2 and the (CT)a 7 repeat in intron 4 (Fig. 1). The sequences (50 mM KC1, 10 mM Tris-HCl, pH 9.0, at 25°C and 0.1% Triton X- 
of these repeats and their flanking regions have been deposited in 100) for 35 cycles. Thermal cycling conditions were 94°C 30 s, 56°C 
GenBank under Accession Nos. L34021 through L34023. 30 s, and 72°C 30 s for the (AAG/AAGG) 56 repeats; 94°C 30 s, 62°C 

We have designed PCR primers to assess for length polymorphisms 30 s, and 72°C 30 s for the (CT)^ repeats; 94°C 30 s f 64°C 30 s, and 
of these repeats in several different strains of laboratory mice (Mus 72*C 30 s for the (CT^ repeats. After amplification, 4 /xl of each of 
musculua) including 129 J, Nude, Scid, Balb/c, B10.S, CBA, C57BIV the PCR products was mixed with 4 fd of the formamide loading dye, 
6J, and a wild species of mouse Mus spretus. The primers are 5'- denatured at 80°C for 2 min, loaded onto a 6% polyacrylamide gel, 
CCAGTACCAGGACCTACACAAAG-3 ' (forward) and 5'-CACTAC- and run together with a sequencing reaction as size marker. The gel 
CACAGCCCAGCAAC-3' (reverse) for the (AAG/AAGG)^ repeats; was then dried and autoradiographed. 
5*-TTGCATTTTGTCATCAGTTCCTCC-3 ' (forward) and 5'-TAGCG- 

ATGCCCAAGAGTTTCTGA-3 ' (reverse) for the (CT) 20 repeats; 5'- motuttc 
GAGCAGCAAGTGCTCTTAACCC-3 ' (forward) and 6'-CAGAGC- K*.»UE,1S> 
TGGCTGGGGTCATG-3 ' for the (CT) 27 repeats. The sequence of the 

repeats is shown in Fig. 2. Comparison of the 5' Upstream Sequences of the Hdh 

The expected sizes of the PCR products for 129J strain of mouse an< ^ Genes 
(Table 1) are 285 bp for the (AAG/AAGG) 56 repeats, 137 bp for the 

(CT)2o repeats, and 167 bp for the (CT) 27 repeats. A total of 4 pmol ^ e have subcloned a 4.1-kb EcdBl fragment, con- 
of one primer of each PCR primer set was end-labeled separately fc . . fa th ^ pu t a tive promoter region and exon 1 
with [y- M P]ATP and T4 polynucleotide kinase in a 25-jd reaction. In ^ ^ *\ 0 a £ u ~. T mntmninff 

each PCR, 0.5 »l of the labeled primer was included. Each PCR of the HD gene, and a 2.6-kb Xbal fragment containing 
reaction was performed with 0.1 of genomic DNA, 0.4 pmol of the putative Hdh promoter region. We have sequenced 
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TABLE 1 

CT Repeat Polymorphisms in the Hdh Gene 
in Different Strains of Mice 



Number of CT repeats 



Strains 


Intron 2 


Intron 4 


129J 


20 


27 


Nude 


22 


25 


Scid 


21 


27 


Balb/c 


21 


27 


H10.S 


20 


26 


CBA 


21 


26 


C57/BL 


20 


26 


Mus spretus 


18 


12 



the entire 4.1-kb EcdRI fragment including 3580 bp 
upstream of the putative ATG translation start codon 
of the HD gene (GenBank Accession No. L34020). Simi- 
larly, we sequenced the promoter region of Hdh (Gen- 
Bank Accession No. L34008) including about 930 bp 
upstream of the putative ATG translational start site. 

Sequence alignment of the 5' region upstream of the 
putative ATG codon of Hdh and HD is shown in Fig. 
3. This alignment reveals the existence of a highly con- 
served region between Hdh and the HD gene from —56 
to —206 (numbered after Hdh) with a nucleotide iden- 
tity of 78.81%. Identity is less significant between —55 
and +1 (translation start site) (54% nucleotide iden- 
tity), while upstream of sequences -207 to —930 of 
Hdh is only about 50% nucleotide identity. Within the 
conserved region, one cAMP responsive element (TGA- 
CGTCA) (Andrisani et aL, 1988) is present at position 
-180 to -174 in Hdh but not in the HD gene. 

Sequence analyses of the 5' region of the HD gene 
have also revealed the existence of two 20-bp pairs di- 
rect repeats (GGCCCCGCCTCCGCCGGCGC) at posi- 
tion -212 to -193 and -192 to -173 with only one 
mismatch (Figs. 3 and 4). However, these repeats are 
not present in the 5' upstream sequences of Hdh. We 
have also identified two direct 17-bp repeats (CCACGC- 
GCCCGGCATCG) at position -516 to -500 and -499 
to -483 in the HD gene (Figs. 3 and 4). These two 17- 
bp repeats were flanked by CCACGCC repeats that are 
identical to the first 7 bp of the 17-bp repeats. 

Assessment for transcriptional protein-binding mo- 
tifs using a commercially available software package 
(MacVector 3.5) revealed a conserved AP2 (CCG- 
CAGGC) site at position -248 to -240 in the HD gene 
and -270 to -262 in Hdh (Fig. 3). There are also 5 
potential Spl binding sites (GGGCGG) at position 
-299 to -293, -318 to -312, -374 to -368, 379 to 
-373, and -427 to -421 in Hdh and 11 potential Spl 
sites in the HD gene at position -15 to —9, -284 to 
-278, -453 to -447, -541 to -535, -571 to -565, 
-587 to -581, -592 to -586, -638 to -632, -643 to 
-637, -654 to -648, and -706 to -700. However, only 
1 of these Spl sites is conserved (Fig. 3). 

Two tandem head to tail Alu repeat sequences were 
identified in the HD gene, both in the opposite direction 



to the transcription of HD gene. One, starting at 
—2099, is a full-length Alu repeat belonging to the Alu- 
Sx subfamily (flanked by CTGGGAACTT direct re- 
peats) as detected after searching the Alu databases 
with PYTHIA server (Milosavljevic and Jurka, 1993; 
Jurka and Milosavljevic, 1991; Hutchinson et aL, 1993 X 
The other Alu, starting at —1723 nucleotide position, 
is a truncated (half) Alu repeat retaining only the 3 f 
end of the Alu sequences and belongs to the Alu-J sub- 
family (Jurka and Milosavljevic, 1991). 

Comparative Genomic Organization of the First 5 
Exons of the HD Gene and Hdh 

A phage contig encompassing the first 5 exons of Hdh 
was constructed (Fig. 1). Phage H5 was initially iso- 
lated by hybridization with a Hdh cDNA probe, MHD2 
(Lin et aL, 1994). A 3.5-kb SphI fragment near the 5' 
end of phage H5 was used to initiate a chromosome 
walk to extend the contig. A second positive phage clone 
(M51) was isolated, but this clone also did not contain 
the CAG repeat. Thereafter, a PCR product spanning 
nucleotides 1 to 123 of the Hdh cDNA was used to 
screen the genomic library, and a third phage clone 
(M42A-D) was isolated and shown to contain the CAG 
exon by hybridization to the CAG repeat. 

Exon-containing fragments were identified by hy- 
bridization with primers derived from Hdh cDNA se- 
quences. These fragments were then subcloned and se- 
quenced (GenBank Accession Nos. L34008 and L34021 
to L34024). The intron/exon junction sequences of the 
first 5 exons of the Hdh all conform to the GT- AG rules 
(Padgett et aL, 1986) (Table 2). The first 5 nucleotides of 
the 5' donor splice sites are identical between Hdh and 
HD, while only the AG in the 3 ' acceptor splice sites 
are identical (Table 2). 

Exons 2 to 5 of Hdh are identical in size to the HD 
gene, with donor and acceptor splice sites at the same 
homologous positions in the HD cDNA sequence (Am- 
brose et aL, 1994). Furthermore, these exons are highly 
conserved with nucleotide identity of 93, 91, 92, and 95% 
for exons 2, 3, 4, and 5, respectively. However, exons 1 
of the Hdh and the HD gene are different in size and 
divergent in sequence identity because of differences not 
only in the length of the CAG repeat length (7 in mouse 
and a mean of 18 in human) but also in that the HD 
gene has 9 extra CCN (N = G, C, or A) repeats and 
other nucleotide sequences (Lin et aL f 1994). 

Intron sizes differ markedly between the HD gene and 
Hdh (Table 2 and Fig, IX Furthermore, the introns have 
significant differences in sequence. Comparison of the 
intron sequences of Hdh with the intron sequences of 
the HD genes including 247 bp of the 5' end of intron 1 
and other limited (60 bp) amounts of intron sequences of 
the HD gene available in GenBank (GenBank Accession 
Nos. L27350-L27354) (Ambrose et aL, 1994) reveals 
only approximately 50% nucleotide identity excluding 
the splice junction signals — GT-AG. We have also iden- 
tified an Ll repeat truncated at the 5 r end in intron 2 
of the Hdh oriented in the opposite direction to the direc- 
tion of transcription of Hdh (Fig. 1). 
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FIG* 3- Sequence alignment of the 5' region upstream of the putative translation start codon between z (GenBank Accession No. L34008) 
and HD (GenBank Accession No. L34020). Identical sequences are boxed. The alignment was performed with the DNA sequence analysis 
software Gene Works 2.5 (IntelliGenetics). AP2 and SP1 sites and cAMP-responsive element binding sites (CRE) are shown in the gray 
boxes. 



Comparison of Three Microsatellite Repeats in the 

Introns of the Hdh and HD Grenes 

We have identified three microsatellite repeats in 
Hdh (Fig. 1). They are (AAG/AAGG) se repeats and 



(CT) 2 o repeats in intron 2 and (CT) 2 7 repeats in intron 
4. The AAG repeat is interrupted 20 times by AAGG 
following 18 perfect AAG repeats. The (CT) a7 repeats 
are preceded by a (CTTCT) 2 (CTCTT) 3 repeat. These mi- 



Material may be protected bv coDvriaht law (Title 17. U.S. Code) 



712 



LIN ET AL. 



20 bp Direct Repeat 

.212 — — — — — — — — — » t»»— — — — — — — ■ M» -173 

GGCCCCGCCTCCGCCGGCGCGGCCCCGCCTCCGCCGGCGC 



SplATG 




17 bp Direct Repeat 
[C^CGCcfc cTACCTCA ^ 



Hdh 



51 



Sp1 
Sp1 



Sp1 



Sy1 A | f T ^ E 



ATG 



-157 



Conserved sites 



► Direct repeat 



L 



100 bp 



FIG. 4. Schematic map showing the relative positions of the putative transcription factor binding sites in the 550 bp upstream of the 
putative translation start site of Hdh and HD, The published Hdh and human HD cDNAs are represented by arrows. The direct repeat 
sequences found in the HD gene are indicated by dashed arrows. Conserved Spl and Ap2 sites are indicated by arrowheads. The region of 
conservation is indicated by a gray box. The putative translation start site (ATG) is designated -1. 



crosatellite repeats are not present in the homologous 
region of the HD gene, as the oligonucleotide primers 
corresponding to those repeats failed to hybridize to 
the DNAs from cosmids spanning this genomic region. 
A total of four different alleles were identified both for 
the (CT) 20 repeats and for the (CT) 27 repeats (Table 1), 
The allele sizes of all three microsatellite repeats are 
the smallest for the wild species of mouse— Mus 
spretus. 

We have also sequenced the (AAG/AAGG) 56 repeat 
from six different inbred mouse strains (Fig. 2D). It is 
evident that the polymorphism of the (AAG/AAGG) 56 
repeats is polar arising from variation within the first 
18 consecutive AAG repeats. The first consecutive 
run of AAG repeats varies from 20 in BIO. S strains 
to 18 in C57BL/6J and 129J strains and 10 in scid and 
Balb/c strains. An adjacent AAGGAAGGAAG repeat 



acts as a longer repeat unit and varies from 10 repeat 
units in balb/c to 9 in scid and 7 in C57BL/6J, B10.S, 
and 129J strains. The rest of the sequences [AAGA- 
AGGAAG(AAGAAGG) 5 (AAG) 6 ] are not polymorphic 
in 129J, B10.S, scid, C57BL/6J, and Balb/c strains of 
mice. 

We have also identified a polymorphic CA repeat in 
intron 1 of the HD gene. Its repeat number is 25 on 
cosnud 191F1. Analysis of 44 normal human chromo- 
somes of Western European origin revealed that the 
CA n is highly polymorphic with 9 distinct allele sizes 
including CA 23 (4.5%), CA 24 (13.6%), CA 25 (6.8%), CA 2S 
(20.5%), CA 27 (15.9%), CA 30 (11.4%), CA^ (6.6%), and 
CA 32 (4.1%). This CA repeat is also found in intron 
1 of Hdh as detected by hybridization with a (CA) 10 
oligonucleotide. 
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TABLE 2 



Comparison of the Exon and Intron Size and Splice Site Sequences of the First 5 Exons 

of the Mouse and Human HD Genes'* 



Exon size 



Exon 


Gene 


(bp) 


5' donor site 


1 


Hdh 




GTGAGTCCGGGCGCCGCAGCTC 


1 


HD 




GTGAGTTTGGGCC CGCTGCAGC 


2 


Hdh 


84 


GTAATTGGCTTTTTAAAAAAAA 


2 


HD 


84 


GTAATTGCACTTTGAACTGTCT 


3 


Hdh 


121 


GT AAGCGCCC C AT AATG ATG AT 


3 


HD 


121 


GTAAGAACCGTGTGGATGATGT 


4 


Hdh 


60 


GTGGGTGTTTGCTCTGCATTAT 


4 


HD 


60 


GTGGGCCTTGCTTTTCTTTTTT 


5 


Hdh 


80 


GTAAGTTGTACCTCTGTATTATTTTTAAGA 


5 


HD 


80 


GTAAGTTGTACACTCTGGATGTTGGTTTTT 



Intron 



Intron size 
(kb) 



3' acceptor site 



1 
1 
2 
2 
3 
3 
4 
4 



-15 TTTTCCTCTTGTTTTTTTGTAG 

- 10 TCCTTCTTTTTTTTATTTTTAG 

~7 TCTCTCTCTCTTTTTTACTTAG 

- 15 TTTCTCTTCTTTTTTTGCTTAG 

~5 AGTCTCTTCTATTTCTTTGCAG 

—7 AATCTCTTGTGATTTGTTGTAG 

-0.5 ATC ACTTGTTAACTCC ACTT AG 

-0.5 AACCCTCATTGCACCCCCTCAG 



• Donor and acceptor splice sites of HD are from Ambrose et aL (1994). 



DISCUSSION 

The high GO content and lack of both typical TATA 
and CAAT czs-elements in the 5' flanking regions in Hdh 
and HD suggest that they are "housekeeping" genes. 
Indeed, Hdh and HD are both abundantly expressed in 
different tissues (Lin et aL, 1993, 1994; Li et aL, 1993; 
Strong et aL, 1993; Ambrose et aL, 1994). The presence 
of a highly conserved region in the 5' flanking region 
between Hdh and HD from -56 to -206 (78.81% nucleo- 
tide identity) suggests that these regions may play a 
critical role in regulating expression of the HD gene. In 
support of this, preliminary mapping of the transcrip- 
tion initiation sites of both Hdh and HD show two major 
transcription initiation sites at —157 and -146 in Hdh 
and -135 and -145 in HD (data not shown). 

The presence of the 17-bp direct repeats (CCACGC- 
CCCCCGCATCG) and the two 20-bp pair perfect re- 
peats (GGCCCCGCCTCCGCCGGCGC) in the human 
HD gene may serve as unique binding sites for trans- 
acting factors that may either direct transcriptional 
initiation or enhance expression of the HD gene. Many 
direct repeats described to date are located within well- 
defined promoters. For example, the Chinese hamster 
ovary dihydrofolate reductase (dhfr) and human low- 
density lipoprotein (LDL) receptor promoters contain 
unique 29- and 16-bp direct repeat sequences, respec- 
tively (Mitchell et aL, 1985; Sudhof et aL, 1987), neces- 
sary for both transcriptional activation and regulated 
expression. The absence of these repetitive elements, 
however, in the 5 r flanking region of Hdh would suggest 
that the expression of these two genes is regulated, in 
part, by different cis- and/or jmns-regulatory elements. 
There are four 29-bp direct repeats found in the ham- 
ster and mouse dhfr promoter, compared to one in the 
human dhfr homologue, with no obvious divergent 
function of the protein (Mitchell et aL, 1986). In con- 
trast, the 16-bp imperfect direct repeats found in the 
promoter of the LDL receptor gene have remained 
highly conserved in the hamster, mouse, and human, 
both in nucleotide sequence and relative position. 
These three direct repeats have been shown to serve 



distinct functions, two of which are responsible for 
binding the trans-regulatory protein Spl, whereas one 
acts as a sterol-responsive element (Bishop, 1992; Sud- 
hof e£ aL, 1987). The conservation of these repeat motifs 
in the LDL receptor promoter alludes to their impor- 
tance in controlling transcription of this gene. In con- 
trast, the lack of conservation of these repeats within 
the Hdh and the HD gene putative promoter regions 
may represent the evolution of genes with different 
patterns of regulation. 

The polar variation of the AAG/AAGG repeat in in- 
tron 2 of Hdh is similar to that observed in other hu- 
man trinucleotide (Kunst and Warren, 1994) and mini- 
satellite (Armour et aL, 1993) repeats. In the AAG/ 
AAGG repeat reported here, the 5' portion consists of 
the AAG repeat without interruption and thus may 
be more unstable, whereas the 3 ' portion consists of 
interrupted repeats and would therefore be more sta- 
ble. It has been suggested that polar variation at repeat 
loci might be a general phenomenon in the human ge- 
nome and implies that mutation within these repeats 
is regulated at least in part by the nature of the se- 
quence itself (Armour et aL, 1993). The polar variation 
of this repeat in the Hdh suggests that this phenome- 
non described in human genes may be more widespread 
and is also evident in nonhuman genomes. 

An important question is why the murine C AG re- 
peat is not polymorphic and is considerably smaller 
than the CAG repeat size in the human gene. In con- 
trast to the CAG repeat in the HD gene, the CAG repeat 
in Hdh is cryptic (interrupted by a CAA repeat after 
two CAG repeats) ( C AGC AGC AAC AGC AGCAGC AG ^ 
Kunst and Warren (1994) have recently demonstrated 
that an AGG repeat interruption in the CGG repeatan 
the FRAXA locus confers some degree of stability for 
the CGG repeat in the gene associated with fragile X 
syndrome. Similarly, for spinocerebellar ataxia tj?J>e 1 
(SCAD, an uninterrupted (CAG) n repeat configuration 
is seen on unstable alleles in the gene, whereas? wjjjjm 
CAG is interrupted by CAT the CAG repeat is more 
stable (Chung et aL, 1993). This suggests that a%rm- 
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portant factor that may be contributing to the stability 
of the CAG repeat in Hdh is its cryptic nature, which 
may at least in part account for the fact that the CAG 
repeat number in Hdh (7 CAG repeats) is significantly 
less than the CAG repeat number (mean = 18) in the 
HD gene. This may also explain why no naturally oc- 
curring murine model for HD has been identified; it 
may not exist. 

The (CT) 20 > (CT) 27) and (AAG/AAGG) S6 repeats iden- 
tified in the intron of Hdh were not found in the homolo- 
gous region of the HD gene. In contrast, the (CA) repeat 
was identified in intron 1 of both the Hdh and the HD 
genes. Stallings (1994) compared 10 different trinucleo- 
tide repeats from which the corresponding homologous 
region sequences were available between human and 
rodent and found no conservation of trinucleotide in 
similar regions. Also, Stallings et ah (1991) compared 
17 (GT)„ repeats between rodents and humans and 
found that only 4 of these GT repeats (23.5%) are local- 
ized in the same map location in both species. It is 
therefore not surprising that the AAG/AAGG repeat 
and the two CT repeats in the introns of Hdh are not 
present in the HD gene, as they could have arisen in 
the mouse gene after the divergence of primates and 
rodents. The (CT) 27 repeat will be convenient for use 
in mapping experiments because the repeat length dif- 
ferences between Mus museulus and Mus spretus are 
on average about 30 and 15 bp, respectively, which 
make them detectable in a regular agarose gel and 
eliminates the need to use hot-PCR. 

In summary, we have performed structural analyses 
of the 5' region, including the promoter, between Hdh 
and HD. Within the promoter, there is one markedly 
conserved region (-56 to -206) between mouse and 
human genes that will now allow functional analysis 
of this region to determine its role in the regulation 
of this gene. The absence of conservation of putative 
transcription binding motifs between HD and Hdh sug- 
gests differences in regulation of these genes in mouse 
and human. 
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